Principal Methods
9
In this section, we list several works that modify the structure of [BNNs], contributing to
better performance or convergence of the network. XNOR-Net and Bi-real Net make minor
adjustments to the original networks, while MCN proposes new filters and convolutional
operations. The loss function is also adjusted according to the new filters, which will be
introduced in Section 1.1.5.
1.1.5
Loss Design
During neural network optimization, the loss function is used to estimate the difference
between the real and predicted values of a model. Some classical loss functions, such as
least squares loss and cross-entropy loss, are widely used in classification and regression
problems. This section will review the specific loss function used in [BNNs].
MCNs [236] propose a novel loss function that considers filter loss, center loss, and
softmax loss in an end-to-end framework. The loss function in MCNs consists of two parts:
L = LM + LS.
(1.13)
The first part LM is:
LM = θ
2
i,l
Cl
i −ˆCl
i ◦M l2 + λ
2
m
fm( ˆC, ⃗M) −¯f( ˆC, ⃗M)
2,
(1.14)
where C is the full precision weights, ˆC is the binarized weights, M is the M-Filters defined
in Section 1.1.4, fm denotes the feature map of the last convolutional layer for the mth
sample, and ¯f denotes the class-specific mean feature map of previous samples. The first
entry of LM represents the filter loss, while the second entry calculates the center loss using
a conventional loss function, such as the softmax loss.
PCNNs [77] propose a projection loss for discrete backpropagation. It is the first to
define the quantization of the input variable as a projection onto a set to obtain a projec-
tion loss. Our BONNs [287] propose a Bayesian-optimized 1-bit CNN model to improve the
performance of 1-bit CNNs significantly. BONNs incorporate the prior distributions of full-
precision kernels, features, and filters into a Bayesian framework to construct 1-bit CNNs
comprehensively, end-to-end. They denote the quantization error as y and the full-precision
weights as x. They maximize p(x|y) to optimize x for quantization to minimize the recon-
structed error. This optimization problem can be converted to a maximum a posteriori since
the distribution of x is known. For feature quantization, the method is the same. Therefore,
the Bayesian loss is as follows:
LB = λ
2
l
l=1
Cl
o
i=1
Cl
i
n=1
{
ˆkl,i
n −wl ◦kl,i
n
2
2
+ v(kl,i
n+ −μl
i+)T (Ψl
i+)−1(kl,i
n+ −μl
i+)
+ v(kl,i
n−−μl
i−)T (Ψl
i−)−1(kl,i
n−−μl
i−)
vlog(det(Ψl))} + θ
2
M
m=1
{
fm −cm
2
+
Nf
n=1
σ−2
m,n(fm,n −cm,n)2 + log(σ2
m,n)
},
(1.15)